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the number and location of the servers are to be determined. The standard approach in the 
literature is to enforce that all requests of a client be served by the closest server in the tree. 
We introduce and study two new policies. In the first policy, all requests from a given client are 
still processed by the same server, but this server can be located anywhere in the path from the 
client to the root. In the second policy, the requests of a given client can be processed by multiple 
servers. 

One major contribution of this paper is to assess the impact of these new policies on the total 
replication cost. Another important goal is to assess the impact of server heterogeneity, both from 
a theoretical and a practical perspective. In this paper, we establish several new complexity results, 
and provide several efficient polynomial heuristics for NP-complete instances of the problem. These 
heuristics are compared to an absolute lower bound provided by the formulation of the problem 
in terms of the solution of an integer linear program. 
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Strategies de placement de repliques sur des arbres 



Resume : Dans ce rapport nous presentons et comparons plusieurs politiques de placement de 
repliques sur des arbres, prenant en compte a la fois des contraintes liees a la capacite de traitement 
de chaque serveur et des contraintes de type QoS (qualite de service). Les requites des clients 
sont connues avant execution, alors que le nombre et remplacement des repliques (serveurs) sont 
a determiner par Palgorithme de placement. L'approche classique impose que toutes les requetes 
d'un client donne soient traitees par un seul serveur, a savoir le plus proche du client dans l'arbre. 
Nous introduisons deux nouvelles politiques de placement. Dans la premiere, chaque client a 
toujours un serveur unique, mais ce dernier peut etre situe n'importe ou sur le chemin qui mene 
du client a la racine dans l'arbre. Avec la deuxieme politique, les requetes d'un meme client 
peuvent etre traitees par plusieurs serveurs sur ce meme chemin. 

Nous montrons que ces deux nouvelles politiques de placement sont a meme de reduire forte- 
ment le cout total de la replication. Un autre objectif de ce travail est Panalyse de l'impact de 
Pheterogeneite de la plate-forme, a la fois d'un point de vue theorique et pratique. Sur le plan 
theorique, nous etablissons plusieurs resultats de complexite, dans les cadres homogene et hetero- 
gene, pour l'approche classique et les nouvelles politiques. Sur le plan pratique, nous concevons 
des heuristiques polynomiales pour les instances combinatoires du probleme. Nous comparons 
les performances de ces heuristiques en les rapportant a une borne inferieure absolue sur le cout 
total de la replication; cette borne est obtenue par relaxation d'un programme lineaire en nombre 
entiers qui caracterise la solution optimale du probleme. 

Mots-cles : Placement de repliques, reseaux en arbre, ordonnancement, complexite, heuris- 
tiques, grappes de calcul heterogenes. 
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1 Introduction 

In this paper, we consider the general problem of replica placement in tree networks. Informally, 
there are clients issuing requests to be satisfied by servers. The clients are known (both their 
position in the tree and their number of requests) , while the number and location of the servers 
are to be determined. A client is a leaf node of the tree, and its requests can be served by one 
or several internal nodes. Initially, there are no replica; when a node is equipped with a replica, 
it can process a number of requests, up to its capacity limit. Nodes equipped with a replica, also 
called servers, can only serve clients located in their subtree (so that the root, if equipped with a 
replica, can serve any client); this restriction is usually adopted to enforce the hierarchical nature 
of the target application platforms, where a node has knowledge only of its parent and children in 
the tree. 

The rule of the game is to assign replicas to nodes so that some optimization function is 
minimized. Typically, this optimization function is the total utilization cost of the servers. If 
all the nodes are identical, this reduces to minimizing the number of replicas. If the nodes are 
heterogeneous, it is natural to assign a cost proportional to their capacity (so that one replica on a 
node capable of handling 200 requests is equivalent to two replicas on nodes of capacity 100 each). 

The core of the paper is devoted to the study of the previous optimization problem, called 
Replica Placement in the following. Additional constraints are introduced, such as guarantee- 
ing some Quality of Service (QoS): the requests must be served in limited time, thereby prohibiting 
too remote or hard-to-reach replica locations. Also, the flow of requests through a link in the tree 
cannot exceed some bandwidth-related capacity. We focus on optimizing the total utilization cost 
(or replica number in the homogeneous case). There is a bunch of possible extensions: dealing with 
several object types rather than one, including communication time into the objective function, 
taking into account an update cost of the replicas, and so on. For the sake of clarity we devote 
a special section (Section [8]) to formulate these extensions, and to describe which situations our 
results and algorithms can still apply to. 

We point out that the distribution tree (clients and nodes) is fixed in our approach. This 
key assumption is quite natural for a broad spectrum of applications, such as electronic, ISP, or 
VOD service delivery. The root server has the original copy of the database but cannot serve all 
clients directly, so a distribution tree is deployed to provide a hierarchical and distributed access 
to replicas of the original data. On the contrary, in other, more decentralized, applications (e.g. 
allocating Web mirrors in distributed networks), a two-step approach is used: first determine 
a "good" distribution tree in an arbitrary interconnection graph, and then determine a "good" 
placement of replicas among the tree nodes. Both steps are interdependent, and the problem is 
much more complex, due to the combinatorial solution space (the number of candidate distribution 
trees may well be exponential). 

Many authors deal with the Replica Placement optimization problem, and we survey related 
work in Section [9l The objective of this paper is twofold: (i) introducing two new access policies 
and comparing them with the standard approach; (ii) assessing the impact of server heterogeneity 
on the problem. 

In most, if not all, papers from the literature, all requests of a client are served by the closest 
replica, i.e. the first replica found in the unique path from the client to the root in the distribution 
tree. This Closest policy is simple and natural, but may be unduly restrictive, leading to a waste 
of resources. We introduce and study two different approaches: in the first one, we keep the 
restriction that all requests from a given client are processed by the same replica, but we allow 
client requests to "traverse" servers so as to be processed by other replicas located higher in the 
path (closer to the root). We call this approach the Upwards policy. The trade-of to explore is the 
following: the Closest policy assigns replicas at proximity of the clients, but may need to allocate 
too many of them if some local subtree issues a great number of requests. The Upwards policy 
will ensure a better resource usage, load-balancing the process of requests on a larger scale; the 
possible drawback is that requests will be served by remote servers, likely to take longer time to 
process them. Taking QoS constraints into account would typically be more important for the 
Upwards policy. 
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In the second approach, we further relax access constraints and grant the possibility for a client 
to be assigned several replicas. With this Multiple policy, the processing of a given client's requests 
will be split among several servers located in the tree path from the client to the root. Obviously, 
this policy is the most flexible, and likely to achieve the best resource usage. The only drawback 
is the (modest) additional complexity induced by the fact that requests must now be tagged with 
the replica server ID in addition to the client ID. As already stated, one major objective of this 
paper is to compare these three access policies, Closest, Upwards and Multiple. 

The second major contribution of the paper is to assess the impact of server heterogeneity, 
both from a theoretical and a practical perspective. Recently, several variants of the Replica 
Placement optimization problem with the Closest policy have been shown to have polynomial 
complexity. In this paper, we establish several new complexity results. Those for the homogeneous 
case are surprising: for the simplest instance without QoS nor bandwidth constraints, the Multiple 
policy is polynomial (as Closest) while Upwards is NP-hard. The three policies turn out to be NP- 
complete for heterogeneous nodes, which provides yet another example of the additional difficulties 
induced by resource heterogeneity. On the more practical side, we provide an optimal algorithm 
for the Multiple problem with homogeneous nodes, and several heuristics for all three policies in 
the heterogeneous case. We compare these heuristics through simulations conducted for problem 
instances without QoS nor bandwidth constraints. Another contribution is that we are able to 
assess the absolute performance of the heuristics, not just comparing one to the other, owing to a 
lower bound provided by a new formulation of the Replica Placement problem in terms of an 
integer linear program: the relaxation of this program to the rational numbers provides a lower 
bound to the solution cost (which is not always feasible). 

The rest of the paper is organized as follows. Section [2] is devoted to a detailed presentation of 
the target optimization problems. In Section[3]we introduce the three access policies, and we give 
a few motivating examples. Next in Section 2] we proceed to the complexity results for the simplest 
version of the Replica Placement problem, both in the homogeneous and heterogeneous cases. 
Section \E\ deals with the formulation for the Replica Placement problem in terms of an integer 
linear program. In Section [6] we introduce several polynomial heuristics to solve the Replica 
Placement problem with the different access policies. These heuristics are compared through 
simulations, whose results are analyzed in Section [7J Section [8] discusses various extensions to the 
Replica Placement problem while Section [9] is devoted to an overview of related work. Finally, 
we state some concluding remarks in Section flOl 

2 Framework 

This section is devoted to a precise statement of the Replica Placement optimization problem. 
We start with some definitions and notations. Next we outline the simplest instance of the problem. 
Then we describe several types of constraints that can be added to the formulation. 

2.1 Definitions and notations 

We consider a distribution tree T whose nodes are partitioned into a set of clients C and a set of 
nodes N . The set of tree edges is denoted as C. The clients are leaf nodes of the tree, while N is 
the set of internal nodes. It would be easy to allow client-server nodes which play both the rule 
of a client and of an internal node (possibly a server) , by dividing such a node into two distinct 
nodes in the tree, connected by an edge with zero communication cost. 

A client i <E C is making requests to database objects. For the sake of clarity, we restrict the 
presentation to a single object type, hence a single database. We deal with several object types 
in Section [H 

A node j £ N may or may not have been provided with a replica of the database. Nodes 
equipped with a replica (i.e. servers) can process requests from clients in their subtree. In other 
words, there is a unique path from a client i to the root of the tree, and each node in this path is 
eligible to process some or all the requests issued by i when provided with a replica. 



RR n° 0123456789 



6 



A. Benoit, V. Rehn, Y. Robert 



Let r be the root of the tree. If j G Af, then children(j) is the set of children of node j. If k ^ r 
is any node in the tree (leaf or internal), parent(fc) is its parent in the tree. If I : k — ► k' = parent(fc) 
is any link in the tree, then succ(Z) is the link k' — > parent(fc') (when it exists). Let Ancestors(fc) 
denote the set of ancestors of node k, i.e. the nodes in the unique path that leads from k up to 
the root r (k excluded). If k' G Ancestors(fc), then path[fc — * k'] denotes the set of links in the path 
from k to k'; also, subtree(fc) is the subtree rooted in k, including k. 

We introduce more notations to describe our system in the following. 

• Clients i G C — Each client i (leaf of the tree) is sending requests per time unit. For such 
requests, the required QoS (typically, a response time) is denoted q i , and we need to ensure 
that this QoS will be satisfied for each client. 

• Nodes j G N — Each node j (internal node of the tree) has a processing capacity \Nj, which 
is the total number of requests that it can process per time-unit when it has a replica. A 
cost is also associated to each node, sCj, which represents the price to pay to place a replica 
at this node. With a single object type it is quite natural to assume that sc, is proportional 
to Wj-: the more powerful a server, the more costly. But with several objects we may use 
non-related values of capacity and cost. 

• Communication links I G £ — The edges of the tree represent the communication links 
between nodes (leaf and internal). We assign a communication time comm; on link I which 
is the time required to send a request through the link. Moreover, BW; is the maximum 
number of requests that link I can transmit per time unit. 

2.2 Problem instances 

For each client i G C, let Servers(i) C M be the set of servers responsible for processing at least 
one of its requests. We do not specify here which access policy is enforced (e.g. one or multiple 
servers), we defer this to Section GO Instead, we let ri )S be the number of requests from client i 
processed by server s (of course, SseServers(i) r M = r »)- I n the following, R is the set of replicas: 

R = {s e Af\ 3i E C , s e Servers(i)} . 

2.2.1 Constraints 

Three main types of constraints are considered. 

Server capacity — The constraint that no server capacity can be exceeded is present in all vari- 
ants of the problem: 

Vs e r, r M ^ w * 

i£C|s£Servers(i) 

QoS — Some problem instances enforce a quality of service: the time to transfer a request from 
a client to a replica server is bounded by a quantity q,. This translates into: 

Vi G C, Vs G Servers(i), comm; < q.^. 

i£path[i^s] 

Note that it would be easy to extend the QoS constraint so as to take the computation cost 
of a request in addition to its communication cost. This former cost is directly related to 
the computational speed of the server and the amount of computation (in flops) required for 
each request. 

Link capacity — Some problem instances enforce a global constraint on each communication link 
I G C: 

iGC,s GServersfi) | i(E path [«— 
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2.2.2 Objective function 

The objective function for the Replica Placement problem is defined as: 

Min ^2 sc s 

As already pointed out, it is frequently assumed that the cost of a server is proportional to its 
capacity, so in some problem instances we let sc s = W s . 

2.2.3 Simplified problems 

We define a few simplified problem instances in the following: 

QoS=distance — We can simplify the expression of the communication time in the QoS con- 
straint and only consider the distance (in number of hops) between a client and its server (s). 
The QoS constraint is then 

Vi e C,Vs G Servers(i), d(i, s) < q t 

where the distance d(i,s) = | path [i — > s]\ is the number of communication links between i 
and s. 

No QoS - We may further simplify the problem, by completely suppressing the QoS constraints. 
In this case, the servers can be anywhere in the tree, their location is indifferent to the client. 

No link capacity — We may consider the problem assuming infinite link capacity, i.e. not bound- 
ing the total traffic on any link in an admissible solution. 

Only server capacities — The problem without QoS and link capacities reduces to finding a 
valid solution of minimal cost, where "valid" means that no server capacity is exceeded. We 
name Replica Cost this fundamental problem. 

Replica counting — We can further simplify the previous Replica Cost problem in the homo- 
geneous case: with identical servers, the Replica Cost problem amounts to minimize the 
number of replicas needed to solve the problem. In this case, the storage cost sc, is set to 1 
for each node. We call this problem Replica Counting. 

3 Access policies 

In this section we review the usual policies enforcing which replica is accessed by a given client. 
Consider that each client i is making Ti requests per time-unit. There are two scenarios for the 
number of servers assigned to each client: 

Single server — Each client i is assigned a single server server(i), that is responsible for processing 
all its requests. 

Multiple servers — A client i may be assigned several servers in a set Servers(z). Each server 
s € Servers(i) will handle a fraction r^ s of the requests. Of course SseServers(i) r M = r *- 

To the best of our knowledge, the single server policy has been enforced in all previous ap- 
proaches. One objective of this paper is to assess the impact of this restriction on the performance 
of data replication algorithms. The single server policy may prove a useful simplification, but may 
come at the price of a non-optimal resource usage. 

In the literature, the single server strategy is further constrained to the Closest policy. Here, 
the server of client i is constrained to be the first server found on the path that goes from i upwards 
to the root of the tree. In particular, consider a client i and its server server(i). Then any other 
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client node i' residing in the subtree rooted in server(«) will be assigned a server in that subtree. 
This forbids requests from i' to "traverse" server(i) and be served higher (closer to the root in the 
tree). 

We relax this constraint in the Upwards policy which is the general single server policy. Notice 
that a solution to Closest always is a solution to Upwards, thus Upwards is always better than 
Closest in terms of the objective function. Similarly, the Multiple policy is always better than 
Upwards, because it is not constrained by the single server restriction. 

The following sections illustrate the three policies. Section I3TT1 provides simple examples where 
there is a valid solution for a given policy, but none for a more constrained one. Section [3T2l shows 
that Upwards can be arbitrarily better than Closest, while Section HO shows that Multiple can 
be arbitrarily better than Upwards. We conclude with an example showing that the cost of an 
optimal solution of the Replica Counting problem (for any policy) can be arbitrarily higher 
than the obvious lower bound 

W 

where W is the server capacity. 



3.1 Impact of the access policy on the existence of a solution 

We consider here a very simple instance of the Replica Counting problem. In this example 
there are two nodes, s\ being the unique child of S2, the tree root (see Figure [l}. Each node can 
process W = 1 request. 



O *2 



O «2 



O *2 



W = 1 




Figure 1: Access policies. 



• If s\ has one client child making 1 request, the problem has a solution with all three policies, 
placing a replica on si or on S2 indifferently (Figure QJ a)). 

• If si has two client children, each making 1 request, the problem has no more solution with 
Closest. However, we have a solution with both Upwards and Multiple if we place replicas 
on both nodes. Each server will process the request of one of the clients (Figure Stb)). 

• Finally, if si has only one client child making 2 requests, only Multiple has a solution since 
we need to process one request on s\ and the other on S2, thus requesting multiple servers 
(Figure [He)). 

This example demonstrates the usefulness of the new policies. The Upwards policy allows to 
find solutions when the classical Closest policy does not. The same holds true for Multiple versus 
Upwards. In the following, we compare the cost of solutions obtained with different strategies. 



3.2 Upwards versus Closest 

In the following example, we construct an instance of Replica Counting where the cost of 
the Upwards policy is arbitrarily lower than the cost of the Closest policy. We consider the tree 
network of Figure where there are 2n + 2 internal nodes, each with Wj = W = n, and 2n + 1 
clients, each with r, = r = 1. 

With the Upwards policy, we place three replicas in S2„, S2 n +i and S2 n +2- All requests can be 
satisfied with these three replicas. 



INRIA 



Strategies for Replica Placement in Tree Networks 



9 




W = n 



1 



1 



1 



Figure 2: Upwards versus Closest 



When considering the Closest policy, first we need to place a replica in S2«+2 to cover its client. 
Then, 

• Either we place a replica on s 2ti +i. In this case, this replica is handling n requests, but there 
remain n other requests from the 2n clients in its subtree that cannot be processed by s 2n +2- 
Thus, we need to add n replicas between s\..S2 n - 

• Otherwise, n — 1 requests of the 2n clients in the subtree of S2 n +i can be processed by S2 n +2 
in addition to its own client. We need to add n + 1 extra replicas among s%, S2, ■ ■ ■ , S2 n - 

In both cases, we are placing n+2 replicas, instead of the 3 replicas needed with the Upwards policy. 
This proves that Upwards can be arbitrary better than Closest on some Replica Counting 
instances. 

3.3 Multiple versus Upwards 

In this section we build an instance of the Replica Counting problem where Multiple is twice 
better than Upwards. We do not know whether there exist instances of Replica Counting 
where the performance ratio of Multiple versus Upwards is higher than 2 (and we conjecture that 
this is not the case). However, we also build an instance of the Replica Cost problem (with 
heterogeneous nodes) where Multiple is arbitrarily better than Upwards. 

We start with the homogeneous case. Consider the instance of Replica Counting represented 
in Figure [3l with 3n + 1 nodes of capacity Wj =W = 2n. The root r has n + 1 children, n nodes 
labeled s\ to s n and a client with = n. Each node Sj has two children nodes, labeled Vj and Wj 
for 1 < j < n. Each node Vj has a unique child, a client with = n requests; each node Wj has a 
unique child, a client with = n + 1 requests. 

The Multiple policy assigns n + 1 replicas, one to the root r and one to each node Sj. The 
replica in Sj can process all the 2n + 1 requests in its subtree except one, which is processed by 
the root. 

For the Upwards policy, we need to assign one replica to r, to cover its client. This replica can 
process n other requests, for instance those from the client child of v\. We need to place at least 
a replica in s\ or in w\, and 2(n — 1) replicas in Vj and Wj for 2 < j < n. This leads to a total of 
2n replicas, hence a performance factor whose limit is to 2 when n tends to infinity. 

We now proceed to the heterogeneous case. Consider the instance of Replica Cost rep- 
resented in Figure [31 with 3 nodes Sx, s 2 and S3, and 2 clients. The capacity of si and S2 is 
Wi = W 2 = n while that of s 3 is W 3 = Kn, where K is arbitrarily large. Recall that in the 
Replica Cost problem, we let scy = Wj for each node. Multiple assigns 2 replicas, in si and S2, 
hence has cost 2n. The Upwards policy assigns a replica to si to cover its child, and then cannot 
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W = 2n 




o o o o 

n n + 1 n n + 1 



n n + 1 



n 



Figure 3: Multiple versus Upwards, homogeneous platforms. 
Si, W\= n 

s 2 , W 2 = n 
s 3, W 3 = Kn 




n — 1 



n + 1 



Figure 4: Multiple versus Upwards, heterogeneous platforms. 



use S2 to process the requests of the child in its subtree. It must place a replica in S3, hence a 
final cost n + Kn = (K + l)n arbitrarily higher than Multiple. 



3.4 Lower bound for the Replica Counting problem 



Obviously, the cost of an optimal solution of the Replica Counting problem (for any policy) 



cannot be lower than the obvious lower bound 



w 



, where W is the server capacity. Indeed, 



this corresponds to a solution where the total request load is shared as evenly as possible among 
the replicas. 

The following instance of Replica Counting shows that the optimal cost can be arbitrarily 
higher than this lower bound. Consider Figured! with n+1 nodes of capacity Wj = W, The root 
r has n+1 children, n nodes labeled s\ to s„, and a client with r, = W. Each node Sj has a 



unique child, a client with 



The lower bound is 



E i6 



W/n (assume without loss of generality that W is divisible by n) . 

2. However, each of the three policies Closest, Upwards 



21T 
W 



and Multiple will assign a replica to the root to cover its client, and will then need n extra replicas, 



one per client of Sj, 
lower bound. 



1 < j < n. The total cost is thus n+1 replicas, arbitrarily higher than the 
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W/n W/n W 



Figure 5: The lower bound cannot be approximated for Replica Counting. 

All the examples in Sections 13.11 to 13.41 give an insight of the combinatorial nature of the 
Replica Placement optimization problem, even in its simplest variants Replica Cost and 
Replica Counting. The following section corroborates this insight: most problems are shown 
NP-hard, even though some variants have polynomial complexity. 



4 Complexity results 

One major goal of this paper is to assess the impact of the access policy on the problem with 
homogeneous vs heterogeneous servers. We restrict to the simplest problem, namely the Replica 
Cost problem introduced in Section [2.2.31 We consider a tree T = C U J\f, no QoS constraint, 
and infinite link capacities. Each client i 6 C has r ?; requests; each node j € A/" has processing 
capacity Wj and storage cost sc, = Wj. This simple problem comes in two flavors, either with 
homogeneous nodes (\Nj = W for all j S AT) , or with heterogeneous nodes (servers with different 
capacities/costs). 

In the single server version of the problem, we need to find a server server(i) for each client 
ieC. Let Servers be the set of servers chosen among the nodes in A/". The only constraint is that 
server capacities cannot be exceeded: this translates into 

Ti < Wj for all j £ Servers. 

i&C ,server(i)—j 

The objective is to find a valid solution of minimal storage cost X^eServers ^j'- Note that with 
homogeneous nodes, the problem reduces to find the minimum number of servers, i.e. to the 
Replica Counting problem. As outlined in Section^ there are two variants of the single server 
version of the problem, namely the Closest and the Upwards strategies. 

In the Multiple policy with multiple servers per client, let Servers be the set of servers chosen 
among the nodes in Af; for any client i E C and any node j 6 Af, let nj be the number of requests 
from i that are processed by j (r,j = if j ^ Servers). We need to ensure that 

ri ,j ~ ri for all i £ C. 

The capacity constraint now writes 

2_j r «J — f° r a ^ 3 e Servers, 

iec 

while the objective function is the same as for the single server version. 

The decision problems associated with the previous optimization problems are easy to formu- 
late: given a bound on the number of servers (homogeneous version) or on the total storage cost 
(heterogeneous version), is there a valid solution that meets the bound? 
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Homogeneous 


Heterogeneous 


Closest 


polynomial [2j [9] 


NP-complete 


Upwards 


NP-complete 


NP-complete 


Multiple 


polynomial 


NP-complete 



Table 1: Complexity results for the different instances of the Replica Cost problem. 

Table [T] captures the complexity results. These complexity results are all new, except for the 
Closesi/Homogeneous combination. The NP-completeness of the c/pwards/Homogeneous case 
comes as a surprise, since all previously known instances were shown to be polynomial, using 
dynamic programming algorithms. In particular, the CZosesi/Homogeneous variant remains poly- 
nomial when adding communication costs [2] or QoS constraints [9]. Previous NP-completeness 
results involved general graphs rather than trees, and the combinatorial nature of the problem 
came from the difficulty to extract a good replica tree out of an arbitrary communication graph. 
Here the tree is fixed, but the problem remains combinatorial due to resource heterogeneity. 

4.1 With homogeneous nodes and the Multiple strategy 

Theorem 1. The instance of the Replica Counting problem with the Multiple strategy can be 
solved in polynomial time. 

Proof. We outline below an optimal algorithm to solve the problem. The proof of optimality is 
quite technical, so the reader may want to skip it at first reading. □ 

4.1.1 Algorithm for multiple servers 

We propose a greedy algorithm to solve the Replica Counting problem. Let W be the total 
number of requests that a server can handle. 

This algorithm works in three passes: first we select the nodes which will have a replica handling 
exactly W requests. Then a second pass allows us to select some extra servers which are fulfilling 
the remaining requests. Finally, we need to decide for each server how many requests of each client 
it is processing. 

We assume that each node i knows its parent parent(i) and its children children(i) in the tree. 
We introduce a new variable which is the flow coming up in the tree (requests which are not 
already fulfilled by a server). It is denoted by flow^ for the flow between i and parent(i). Initially, 
\/i € C flow; = Ti and Vi £ H flowj = — 1. Moreover, the set of replicas is empty in the beginning: 
repl = 0. 

Pass 1— We greedily select in this step some nodes which will process W requests and which 
are as close to the leaves as possible. We place a replica on such nodes (see Algorithm [l]) . 
Procedure passl is called with r (root of the tree) as a parameter, and it goes down the tree 
recursively in order to compute the flows. When a flow exceeds W, we place a replica since 
the corresponding server will be fully used, and we remove the processed requests from the 
flow going upwards. 

At the end, if flow r = or (flow r < W and r ^ repl), we have an optimal solution since 
all replicas which have been placed are fully used and all requests are satisfied by adding a 
replica in r if flow r ^ 0. In this case we skip pass 2 and go directly to pass 3. 

Otherwise, we need some extra replicas since some requests are not satisfied yet, and the 
root cannot satisfy all the remaining requests. To place these extra replicas, we go through 
pass 2. 

Pass 2— In this pass, we need to select the nodes where to add replicas. To do so, while there are 
too many requests going up to the root, we select the node which can process the highest 
number of requests, and we place a replica there. The number of requests that a node 
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procedure passl (node s G TV) 
begin 

flow s = 0; 

for i G children(s) do 

if flowi == —1 then passl(i); // Recursive call. 

flow s = flow s + flowi' 
end 

if flow s > W then flow s = flow s — W; repl = {s} U repl; 
end 

Algorithm 1: Procedure passl 

j G Af can eventually process is the minimum of the flows between j and the root r, denoted 
uflowj (for wseful flow). Indeed, some requests may have no server yet, but they might be 
processed by a server on the path between j and r, where a replica has been placed in pass 1. 
Algorithm [2] details this pass. 

If we exit this pass with finish = — 1, this means that we have tried to place replicas on 
all nodes, but this solution is not feasible since there are still some requests which are not 
processed going up to the root. In this case, the original problem instance had no solution. 

However, if we succeed to place replicas such that flow r = 0, we have a set of replicas which 
succeed to process all requests. We then go through pass 3 to assign requests to servers, i.e. 
to compute how many requests of each client should be processed by each server. 

while flow r ^ do 

freenode = M \ repl; 

if freenode == then finish = — 1; exit the loop; 
//At each step, assign 1 replica and re-compute flows, 
child = children(r); uflow r = flow r ; 
while childl = do 

remove j from child; 

uflowj = mm(floWj,uflow p3rent{j) ); 

child = child U children(j); 
end 

// The useful flows have been computed, select the max. 

maxuflow=0; 

for j G freenode do 

if uflouij > maxuflow then maxuflow = uflowj] maxnode = j; 
end 

if maxuflow / then 

repl = repl U {maxnode}; 
// Update the flows upwards. 

for j G Ancestors(maxnode) U {maxnode} do flovjj = flowj — maxuflow; 
end 

else finish = —1; exit the loop; 
end 

Algorithm 2: Pass 2 

Pass 3— This pass is in fact straightforward, starting from the leaves and distributing the requests 
to the servers from the bottom until the top of the tree. We decide for instance to affect 
requests from clients starting to the left. Procedure pass3 is called with r (root of the tree) 
as a parameter, and it goes down the tree recursively (c.f. Algorithm [3]) . For i G C, r[ 
is the number of requests of i not yet affected to a server (initially r\ = r/). w Sy i is the 
number of requests of client i affected to server s G Af, and w s < W is the total number of 
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requests affected to s. C(s) is the set of clients in subtree(s) which still have some requests 
not affected. Initially, C(i) = {i} for i £ C, and C(s) = otherwise. 

Note that a server which was computing W requests in pass 1 may end up computing fewer 
requests if one of its descendants in the tree has earned a replica in pass 2. But this does 
not affect the optimality of the result, since we keep the same number of replicas. 

procedure pass3 (node s £ Af) 
begin 

w s = 0; 

for i £ child ren(s) do 

if C(i) = then pass3(i); // Recursive call. 

C(s) = C(«)UO(i); 
end 

if s £ repl then 
for i £ C{s) do 

| if r'(i) < 1/1/- w s then C{s) = C{s) \ {i}; w s>i = r[- w s = w s + r[; r\ = 0; 
end 

if C(s) 7^ then Let i £ C(s); x = W — w s ; r[ = — x; w Sy i = x; w s = W; 
end 
end 

Algorithm 3: Procedure pass3 

The proof in Section 14.1.31 shows the equivalence between the solution built by this algorithm 
and any optimal solution, thus proving the optimality of the algorithm. The following example 
illustrates the step by step execution of the algorithm. 

4.1.2 Example 

Figure [6ta) provides an example of network on which we are placing replicas with the Multiple 
strategy. The network is thus homogeneous and we fix W = 10. 

Pass 1 of the algorithm is quite straightforward to unroll, and Figure Eljb) indicates the flow 
on each link and the saturated replicas are the black nodes. 

During pass 2, we select the nodes of maximum useful flow. Figure[6](c) represents these useful 
flows; we see that node is the one with the maximum useful flow (7), so we assign it a replica 
and update the useful flows. All the useful flows are then reduced down to 1 since there is only 1 
request going through the root n\ . The first node of maximum useful flow 1 to be selected is rii , 
which is set to be a replica of pass 2. The flow at the root is then and it is the end of pass 2. 

Finally, pass 3 affects the servers to the clients and decides which requests are served by which 
replica (Figure EJd)). For instance, the client with 12 requests shares its requests between mo (10 
requests) and n-i (2 requests). Requests are affected from the bottom of the tree up to the top. 
Note that the root m, even though it was a saturated replica of pass 1, has only 5 requests to 
proceed in the end. 

4.1.3 Proof of optimality 

Let R op t be an optimal solution to an instance of the problem. The core of the proof consists in 
transforming this solution into an equivalent canonical optimal solution R can - We will then show 
that our algorithm is building this canonical solution, and thus it is producing an optimal solution. 
Each server s £ R opt is serving w s ,i requests of client i £ subtree(s) PI C, and 

W s = ^2 W s ,i < W. 

iGsubtree(s)nC 

For each i G C, w S:l — if s £ N is not a replica, and, J2seAncests(i) ""s.i = 
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W = 10 




(a) Initial network 
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(b) Pass 1 





(c) Pass 2 



(d) Pass 3 



Figure 6: Algorithm for the Replica Counting problem with the Multiple strategy. 



We define the flow of node k, flow^, by the number of requests going through this node up to 
its parents. Thus, for ieC, flowi = r i: while for a node s G Af, 



flow s 



E 

iGchildren(s) 



floWi - W s . 



The total flow going through the tree, tflow, is defined in a similar way, except that we do not 
remove from the flow the requests processed by a replica, i.e. tflow s = X)iechiidren(s) tfl° w i- We 
thus have 

tflow s = ^2 r t . 

i£subtree(s)nC 

These variables are completely defined by the network and the optimal solution R op t- 
A first lemma shows that it is possible to change request assignments while keeping an optimal 
solution. The flows need to be recomputed after any such modification. 

Lemma 1 . Let s G J\f f) R opt be a server such that w s < W. 

• If tflow s > W, we can change the request assignment between replicas of the optimal solution, 
in such a way that w s = W. 
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• Otherwise, we can change the request assignment so that w s = tflow s . 

Proof. First we point out that the clients in subtree(s) can all be served by s, and since R op t is 
a solution, these requests are served by a replica somewhere in the tree. We do not modify the 
optimality of the solution by changing the w St i, it just affects the flows of the solution. Thus, for 
a given client i G subtree(s) DC, if there is a replica s' ^ s on the path between i and the root, we 
can change the assignment of the requests of client i. Let x = max(w s r W — w s ). Then we move 
x requests, i.e. w s 'j = u> s <,i — x and w Sj! ; = w s ^ + x. From the definition of tflow s , we obtain the 
result, if we move all possible requests to s until there are no more requests in the subtree or until 
s is processing W requests. □ 

We now introduce a new definition, completely independent from the optimal solution but 
related to the tree network. The canonical flow is obtained by distinguishing nodes which receive 
a flow greater than W from the other nodes. We compute the canonical flow cflow of the tree, 
independently of the replica placement, and define a subset of nodes which are saturated, SN. We 
also compute the number of saturated nodes in subtree(fc), denoted nsrik, for any node k G CL)J\f 
of the tree. 

For i G C, cflowi = and nsrii = 0, and we then compute recursively the canonical flows for 

nodes s G M. Let f s = Ei eC hiidren( s ) cflow, and x s = E iec hiidren( s ) nsn i- If fs > w then s G SN, 
cflow s = f s — W and nsn s = x s + 1. Otherwise, s is not saturated, cflow s = f s and nsn s = x s . 
We can deduce from these definitions the following results: 

Proposition 1. A non saturated node always has a canonical flow being less than W: 
Vs G TV \ SN cflows < W 

Lemma 2. For all nodes s G C Uj\f, cflow s — tflow s — nsn s x W. 

Corollary 1. For all nodes s G C UAf, tflow s > nsn s x W. 

Proof. Proposition Q] is trivial due to the definition of the canonical flow. 
Lemma [2] can be proved recursively on the tree. 

• This property is true for the clients: for i G C, nsrii = and tflowi = cflowi = r.;. 

• Let s G N, and let us assume that the proposition is true for all children of s. Then, 

Vj G children(s) cflowj = tfloWj — nsrij x W. 
-Mai SN, nsn s = Ej eC hiidren( s ) nsn J and 

cflow s — cflowj = (tflowj — nsrij x W) = tflow s — nsn s x W 

jGchildren(s) j£children(s) 

- If s G SN, nsn s = (Ejgchiidrw.(.) nsn j) + 1 and 

cflow s = cflowj — W = V] (tflowj — nsrij x W) — W 

jGchildren(s) j'Gchildren(s) 

= tflow s — (nsn s — 1) x W — W = tflow s — nsn s x W 

which proves the result. Corollary [TJ is trivially deduced from Lemma [2] since cflow is a positive 
function. □ 

We also show that it is always possible to move a replica into a free server which is one of its 
ancestors in the tree, while keeping an optimal solution: 

Proposition 2. Let R op t be an optimal solution, and let s G R op t- If 3,s' G An cestors(s) \ R opt 
then R' opt = {s'} U R op t \ {s} is also an optimal solution. 
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Proof. s' can handle all requests which were processed by s since s G subtree(s'). We just need 
to redefine «v,i = w s ^ for alH € C and then w Si i = 0. □ 

We are now ready to transform R opt into a new optimal solution, R sa t, by redistributing the 
requests among the replicas and moving some replicas, in order to place a replica at each saturated 
node, and affecting W requests to this replica. This transformation is done starting at the leaves 
of the tree, and considering all nodes of SN. Nothing needs to be done for the leaves (the clients) 
since they are not in SN. 

Let us consider s G SN, and assume that the optimal solution has already been modified to 
place a replica, and assign it W requests, on all nodes in subSN = SN n subtree(s) \ {s}. 

We need to differentiate two cases: 

1. If s G R op t, we do not need to move any replica. However, if w s ^ W, we change the 
assignment of some requests while keeping the same replicas in order to obtain a workload 
of W on server s. We do not remove requests from the saturated servers of subSN which 
have already been filled. Corollary [TJ ensures that tflow s > nsn s x W, and (nsn s — 1) x W 
requests should not move since they are affected to the nsn s — 1 servers of subSN. There 
are thus still more than W requests of clients of subtree(s) which can possibly be moved on 
s using Lemma [TJ 

2. If s £ R pt, we need to move a replica of R op t and place it in s without changing the 
optimality of the solution. We differentiate two subcases. 

(a) If 3si G subtree(s) n R op t \ SN, then the replica placed on Si can be moved in s by 
applying Proposition [H Then, if w s ^ W, we apply case 1 above to saturate the server. 

(b) Otherwise, all the replicas placed in subtree(s) are also in SN, and the flow consumed 
by the already modified optimal algorithm is exactly (nsn s — 1) x W. It is easy to see 
that the flow (of the optimal solution) at s is exactly equal to the total flow minus the 
consumed flow. Therefore, flow s = tflow s — (nsn s — 1) x W, and with the application 
of Corollary [TJ flow s > W. 

The idea now consists in affecting the requests of this flow to node s by removing work 
from the replicas upwards to the root, and rearrange the remaining requests to remove 
one replica. The flow flow s is going upwards to be processed by some of the nr s replicas 
in Ancestors(s) n R op t, denoted si, s nTa , s\ being the closest node from s. We can 
remove W of these requests from the flow and affect them to a new replica placed in 
s. Let w SktS = Ejesubtree( s )nc «W We have £k=x..nr, w Sfc , s = flow s . We move these 
requests from Sk to s, starting with k = 1. Thus, after the modification, w si , a = 0. It 
is however possible that w si ^ since si may process requests which are not coming 
from subtree(s). In this case, we are sure that we have removed enough requests from 
Sk, k = 2..nr s which can instead process requests still in charge of s\. We can then 
remove the replica initially placed in s\. 

This way, we have not changed the assignment on replicas in subSN, but we have 
placed a replica in s which is processing W requests. Since we have at the same time 
removed the first replica on the path from s to the root (si), we have not changed the 
number of replicas and the solution is still optimal. 
Once we have applied this procedure up to the root, we have an optimal solution R sa t in which 
all nodes of SN have been placed a replica and are processing W requests. We will not change the 
assignment of these replicas anymore in the following. Free nodes in the new solution are called 
F-nodes, while replicas which are not in SN are called PS-nodes, for partially saturated. 

In a next step, we further modify the R sa t optimal solution in order to obtain what we call 
the canonical solution R can . To do so, we change the request assignment of the PS-nodes: we 
"saturate" some of them as much as we can and we integrate them into the subset of nodes SN, 
redefining the cflow accordingly. At the end of the process, SN = R can . 

The cflow is still the flow which has not been processed by a saturated node in the subtree, 
and thus we can express it in a more general way: 

cflow s = tflow s — W s i 

s'£SNnsubtree(s) 
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Note that this is totally equivalent to the previous definition while we have not modified SN. 

We also introduce a new flow definition, the non-saturated flow of s, nsflaw s , which counts the 
requests going through node s and not served by a saturated server anywhere in the tree. Thus, 

nsflow s = cflow s — w s ' t i. 

i£children(s)nC s'eAncestors(s)nSW 

This flow represents the requests that can potentially be served by s while keeping all nodes of 
SN saturated. 

Lemma 3. In a saturated optimal solution, there cannot exist a PS-node in the subtree of another 
PS-node. 

Proof. The non-saturated flow is nsflow s < cflow s since we further remove from the canonical 
flow some requests which are affected upwards in the tree to some saturated servers. 

Let s G R sa t \ SN be a PS-node. Its canonical flow is cflow s < W. It can potentially process 
all the requests of the subtree which are not affected to a saturated server upwards or downwards 
in the tree, thus nsflow s requests. Since nsflow s < cflow s < W, we can change the request 
assignment to assign all these nsflow s requests to s, removing eventually some work from other 
non-saturated replicas upwards or downwards which were processing these requests. Thus, the 
replica on node s is processing all the requests of subtree(s) which are not processed by saturated 
nodes. 

If there was a non saturated replica in subtree(s), it could thus be removed since all the requests 
are processed by s. This means that a solution with a PS-node in the subtree of another PS-node 
is not optimal, thus proving the lemma. □ 

At this point, we can move the PS-nodes as high as possible in R sa t- Let s be a PS-node. If 
there is a free node s' in Ancestors(s) then we can move the replica from s to s' using Proposition^ 
Lemma [3] ensures that there are no other PS-nodes in subtree(s'). 

All further modifications will only alter nodes which have no PS-nodes in their ancestors. We 
define Af' = {s|Ancestors(.s) \SN = 0}. 

Let S E Af'. nsfl0W s = cfl0W s - E; Sc hildren( S )nC Es'e Ancestors^) Ws '^ sinCe a11 ancestors of s 

are in SN. Thus, 

nsflow s = w s >. 

s' Gsubtree( s)\SN 

By definition, Vs G Af nsflow s < cflow s . Moreover, if s ^ SN, then nsflow s = w s since 
subtree(s) \ SN is reduced to s (no other PS-node under the PS-node s, from Lemma|3|. 

We introduce a new flow definition, the useful flow, which intuitively represents the number of 
requests that can possibly be processed on s without removing requests from a saturated server. 

uflow s = min {cflow s '} 

s'£Ancestors(s)u{s} 

Lemma 4. Let s £ Af' . Then nsflow s < uflow s . 
Proof. Let s' G Ancestors(s). Since s G Af' , s' G SN. 

cflow s i > nsflow S ' = w 8 ii 

s"Ssubtree(s')\S7V 

But since s G subtree(s'), subtree(s) \ SN C subtree(s') \ SN, hence nsflow s < nsflow s >. Note 
that nsflow is a non decreasing function (when going up the tree). 

Thus, Vs' G Ancestors(s) U {s}, nsflow s < cflow s >, and by definition of the useful flow, 
nsflow s < uflow s . □ 
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Now we start the modification of the optimal solution in order to obtain the canonical solution. 
At each step, we select a node s 6 Af\ SN maximizing the useful flow. If there are several nodes 
of identical uflow, we select the first one in a depth-first traversal of the tree. We will prove 
that we can affect uflow s requests to this node without unsaturating any server of SN. s is then 
considered as a saturated node, we recompute the canonical flows (and thus the useful flows) and 
reiterate the process until cflow r = 0, which means that all the requests have been affected to 
saturated servers. 

Let us explain how to reassign the requests in order to saturate s with uflow s requests. The 
idea is to remove some requests from Ancestors(s) in order to saturate s, and then to saturate the 
ancestors of s again, by affecting them some requests coming from other non saturated servers. 

First, we note that uflow s < cflow r = nsflow r . Thus, 

uflow s < W s ' = w s + W s i 

s'£Af\SN s'GPS 

where PS is the set of non saturated nodes without s. Let x = uflow s — w s . If x — 0, s is already 
saturated. Otherwise, we need to reassign x requests to s. From the previous equation, we can see 
that J2 s 'ePS Ws ' — uflows — w s = x. There are thus enough requests handled by non saturated 
nodes which can be passed to s. 

The number of requests of subtree(s) R C handled by Ancestors(s) is 

w S '.i = cflow a — nsflow s 

s'SAncestors(s) iGsubtree(s)nC 

by definition of the flow. Or cflow s — nsflow s > uflow s — w s = x so there are at least x requests 
that s can take from its ancestors. 

Let ai = parent(s), ...,a,k = r be the ancestors of s. Xj = Siesubtree(s)nc w ajA ls the amount of 
requests that s can take from aj. We choose arbitrary where to take the requests if J2j x j > x i 
and do not modify the assignment of the other requests. We thus assume in the following that 
J2j x j = x - Since these Xj requests are coming from a client in subtree(s), we can assign them 
to s, and there are now only W — Xj requests handled by aj, which means that dj is temporarily 
unsaturated. However, we have given x extra requests to s, hence s is processing w s +x = uflow s 
requests. 

We finally need to reassign requests to aj,j = l..k in order to saturate these nodes again, 
taking requests out of nodes in PS (non saturated nodes other than s) . This is done iteratively 
starting with j = 1 and going up to the root o^. At each step j, we assume that aj',f < j have 
already been saturated again and we should not move requests away from them. However, we can 
still eventually take requests away from dj»,j" > j. 

In order to saturate dj, we need to take: 

• either requests from subtree(aj) tlC which are currently handled by aj»,j" > j, but without 
moving requests which are already affected to s (i.e. J2j">j x j")> 

• or requests from non saturated servers in subtree(aj), except requests from s and requests 
already given to s that should not be moved any more (i.e. J2j><j x j')- 

The number of requests that we can potentially affect to aj is therefore: 

x= w s ,+ j2 ws^i-j^xj, - J2 x i- 

s'Gsubtree(aj)\S-/V\{s} iesubtree(aj)nC s'GAncestors(aj) j'<j j">j 

Let us show that X > Xj. Then we can use these requests to saturate aj again. 

cflow aj — nsflow aj + w 8 > t i = w 8 +X+ x j'+ x j" — X+w s +x— xj 

i6subtree(o 3 ')nC s'GAncestors(aj) j'<j j">j 

But cflow aj > uflow s and uflow s — w s = x so 

X = cfloW aj — W s — X + Xj > ufl0W s — W s — X + Xj = Xj 
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It is thus possible to saturate s and then keep its ancestors saturated. At this point, s becomes 
a node of SN and we can recompute the canonical and non saturated flows. We have removed 
uflow s requests which were processed by non saturated servers, so the cflow and nsflow of all 
ancestors of s, including s, should be decreased by uflow s . 

In particular, at the root, cflow r — cflow r — uflow s , which proves that the contribution of s 
on cflow r is uflow s . 

In the last step of the proof, we show that the number of replicas in the modified canonical 
solution at the end of the iteration R can = SN has exactly the same number of replicas than R sat . 
In the saturated solution, each PS-node s is processing nsflow s requests, while in the canonical 
solution, it is uflow s . However, at every step when adding a saturated node s, we have uflow s 
greater than any of the nsflows. It is thus easy to see that the number of nodes in the canonical 
solution is less or equal to the number of nodes in the saturated solution. Since the saturated 
solution is optimal, \R can \ = \R sa t\, which completes the proof. 

Our algorithm builds R ca n in polynomial time, which assesses the complexity of the problem. 

4.2 With homogeneous nodes and the Upwards strategy 

Theorem 2. The instance of the Replica Counting problem with the Upwards strategy is 
NP-complete in the strong sense. 

Proof. The problem clearly belongs to the class NP: given a solution, it is easy to verify in 
polynomial time that all requests are served and that no server capacity is exceeded. To establish 
the completeness in the strong sense, we use a reduction from 3-PARTITION [3]. We consider an 
instance X\ of 3-PARTITION: given 3m positive integers ai, 02, . . . , a 3m such that B/4 < ai < B/2 
for 1 < i < 3m, and Y2i=i a i = m B, can we partition these integers into m triples, each of sum 
Bl We build the following instance X 2 of Replica Counting (see Figure[7]): 

• 3m clients Ci with r; = for 1 < i < 3m. 

• m internal nodes rij with Wj = sCj = B for 1 < j < m. 

- The children of n\ are all the 3m clients Ci, and its parent is ri2. 

- For 2 < j < m, the only child of rij is rij_i. For 1 < j < m — 1, the parent of nj is nj + i 
(hence n m is the root). 

Finally, we ask whether there exists a solution with total storage cost mi?, i.e. with a replica 
located at each internal node. Clearly, the size of T2 is polynomial (and even linear) in the size of 
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Figure 8: The platform used in the reduction for Theorem OS 



We now show that instance X\ has a solution if and only if instance I2 does. Suppose first that 
X\ has a solution. Let (a^ , a>k 2 ? a k 3 ) be the fc-triplet in X\. We assign the three clients c^, Ck 2 
and Cfc 3 to server n&. Because + a>k 2 + a fc 3 = B, no server capacity is exceeded. Because the to 
triples partition the dj, all requests are satisfied. We do have a solution to X 2 . 

Suppose now that X 2 has a solution. Let Ik be the set of clients served by node rik if there 
is a replica located at n^: then Yliei a i — B. The total number of requests to be satisfied is 
X^=i a i = mB, and there are at most to replicas of capacity B. Hence no set Ik can be empty, 
and J2iei k ai — ^ ^ or — ^ — m - Because B/4 < a; < B/2, each Ik must be a triple. This leads 
to the desired solution of X\ . □ 



4.3 With heterogeneous nodes 

Theorem 3. All three instances of the Replica Cost problem with heterogeneous nodes are 
NF '-complete. 

Proof. Obviously, the NP-completeness of the Upwards strategy is a consequence of Theorem [2l 
For the other two strategies, the problem clearly belongs to the class NP: given a solution, it 
is easy to verify in polynomial time that all requests are served and that no server capacity is 
exceeded. To establish the completeness, we use a reduction from 2-PARTITION [3]. We consider 
an instance X\ of 2-PARTITION: given m positive integers a\, a 2 , ■ ■ ■ , a m , does there exist a subset 
I C {1, . . . , to} such that Yliei ai = Sigj a ' ; - & = Y^iLi a »- We build the following instance 
I2 of Replica Cost (see Figure [8]): 

• to + 1 clients c, with r, = aj for 1 < i < to and ?* m +i = 1. 

• to + 1 internal nodes: 

- to nodes nj, 1 < j < to, with \Nj = scj = dj. 

- A root node r with W r = sc r = S/2 + 1. - The only child of nj is Cj. The parent of rij is 
r. The parent of c„+i is r. 

Finally, we ask whether there exists a solution with total storage cost S + 1 . Clearly, the size of I2 
is polynomial (and even linear) in the size of I\ . We now show that instance I\ has a solution if 
and only if instance I2 does. The same reduction works for both strategies, Closest and Multiple. 

Suppose first that X\ has a solution. We assign a replica to each node n,-, i £ /, and one in the 
root r. Client Ci is served by if i € X, and by the root r otherwise, i.e. if i ^ X or if i = to, + 1. 
The total storage cost is J2jei Wj + W r = 5+1. Because W r = S/2 + 1 = J2i<£i r i + r n+i, the 
capacity of the root is not exceeded. Note that the server allocation is compatible both with the 
Closest and Multiple policies. In both cases, we have a solution to X 2 . 
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Suppose now that Z2 has a solution. Necessarily, there is a replica located in the root, otherwise 
client c„+i would not be served. Let I be the index set of nodes rij, 1 < j < n, which have been 
allocated a replica in the solution of X2. For j ^ /, there is no replica in node rij, hence all 
requests of client Cj are processed by the root, whose storage capacity is S/2 + 1. We derive that 
Y^jii r j — S/2. Because the total storage capacity is S + 1, the total storage capacity of nodes in 
/ is S/2. The proof is slightly different for the two server strategies: 

• For the Closest strategy, all requests from a client Cj £ / are served by rij, hence J2jei r j — 
S/2. Since Y^,jei r o + ^jii r i = S, we derive J2jei r i = ^jii r i = S/2, hence a solution to 

• For the Multiple strategy, consider a server j €E /. Let r'j be the number of requests from 
client Cj served by rij, and r'J be the number of requests from Cj served by the root r (of 
course rj = r'j + r"). All requests from a client cj, j ^ /, are served by the root. Let 

A = J2 je i r 'r B = T,jei r j and C = 2,^7 r r The total storage cost is A + B + S/2 + 1, 
hence A + B < S/2. We have seen that C < S/2. But A + B + C = S, hence B = 0, and 
A = C = S/2, hence a solution to I2. 

□ 

5 Linear programming formulation 

In this section, we express the Replica Placement optimization problem in terms of an integer 
linear program. We deal with the most general instance of the problem on a heterogeneous tree, 
including QoS constraints, and bounds on resource usage (both server and link capacities). We 
derive a formulation for each of the three server access policies, namely Closest, Upwards and 
Multiple. This is an important extension to a previous formulation due to [8]. 

While there is no efficient algorithm to solve integer linear programs (unless P=NP), this 
formulation is extremely useful as it leads to an absolute lower bound: we solve the integer 
linear program over the rationals, using standard software packages [US]. Of course the rational 
solution will not be feasible, as it assigns fractions of replicas to server nodes, but it will provide 
a lower bound on the storage cost of any solution. This bound will be very helpful to assess the 
performance of the polynomial heuristics that are introduced in Section [6l 

5.1 Single server 

We start with single server strategies, namely the Upwards and Closest access policies. We need 
to define a few variables: 

Server assignment 

boolean variable equal to 1 if j is a server (for one or several clients) 

• yi.j is a boolean variable equal to 1 if j = server(i) 

• If j £ Ancests(i), we directly set yi.j = 0. 

Link assignment 

• Zi,\ is a boolean variable equal to 1 if link I G path[i — ► r] is used when client i accesses 
its server server(i) 

• If I path[i — > r] we directly set z,.; = 0. 

The objective function is the total storage cost, namely J2jeJ^f sc i x 3 '•■ ^ e nst below the con- 
straints common to the Closest and Upwards policies: First there are constraints for server and 
link usage: 
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• Every client is assigned a server: Vi G C, X^eAncestore(i) 2/i,j = l - 

• All requests from i £ C use the link to its parent: Zi.i-> pa rent(i) = 1 

• Let i e C, and consider any link Z : j — > j' = parent(j) G path[i — > r]. If j' = server(i) then 
link succ(Z) is not used by i (if it exists). Otherwise Zj iS ucc(Z) = Zij. Thus: 

Vi eC,V/:j^ f = parent(j) G path[i -» r], ^, succ (0 = - Ui,f 

Next there are constraints expressing that server capacities and link bandwidths cannot be 
exceeded: 

• The processing capacity of any server cannot be exceeded: Vj G AA, Xaec — ^j x j- 
Note that this ensures that if j is the server of i, there is indeed a replica located in node j. 

• The bandwidth of any link cannot be exceeded: VZ G C, J2iec r i z i-\ — BW/. 
Finally there remains to express the QoS constraints: 

Vi € C,Vj G Ancestors(i), dist(i, j)yij < q,, 

where dist(i, j) = X)ie pa th[i-»j] comm /- As stated previously, we could take the computational time 
of a request into account by writing (dist(i, j) + compj)yij < q i5 where compj would be the time 
to process a request on server j. 

Altogether, we have fully characterized the linear program for the Upwards policy. We need 
additional constraints for the Closest policy, which is a particular case of the Upwards policy 
(hence all constraints and equations remain valid). 

We need to express that if node j is the server of client i, then no ancestor of j can be the 
server of a client in the subtree rooted at j. Indeed, a client in this subtree would need to be 
served by j and not by one of its ancestors, according to the Closest policy. A direct way to write 
this constraint is 

Vi G C, Vj G Ancestors(i), Vi' eCfl subtree(j), Vj' G Ancestors(j), y^j < 1 - yvj'- 

Indeed, if y^j = 1, meaning that j = server(i), then any client i' in the subtree rooted in j must 
have its server in that subtree, not closer to the root than j. Hence yi'j' = for any ancestor j' 
of j. 

There are 0(s ) such constraints to write, where s — \C\ + \Af\ is the problem size. We can 
reduce this number down to 0(s 3 ) by writing 

Vi G C, Vj G Ancestors(i) \ {r}, Vi' G C n subtree(j), y tJ < 1 - z v ^parentO') ■ 
5.2 Multiple servers 

We now proceed to the Multiple policy. We define the following variables: 
Server assignment 

• Xj is a boolean variable equal to 1 if j is a server (for one or several clients) 

• yij is an integer variable equal to the number of requests from client i processed by 
node j 

• If j ^ Ancests(i), we directly set yi.j = 0. 
Link assignment 

• Zi t i is an integer variable equal to the number of requests flowing through link I G 
path[i — > r] when client i accesses any of its servers in Servers(i) 
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• If I £ path[i — » r] we directly set Zjj = 0. 

The objective function is unchanged, as the total storage cost still writes YljeN 5C j x j- But the 
constraints must be modified. First those for server and link usage: 

• Every request is assigned a server: Vi G C, X^ G Ancestors(i) Vhi = r <- 

• All requests from i G C use the link to its parent: Zi,w pa rent(i) = r i 

• Let i £ C, and consider any link I : j —*■ j' = parent(j) G path[i — > r]. Some of the requests 
from i which flow through I will be processed by node j' , and the remaining ones will flow 
upwards through link succ(Z): 

Vi eC,Vi:j^ j' = parent(j) G path[i -> r],^ iSucc(J ) = z^,; - y t j> 

The other constraints on server capacities, link bandwidths and QoS are slightly modified: 

• Servers: Vj G -A^jX^iec — ^jXj. Note that this ensure that if j is the server for one or 
more requests from i, there is indeed a replica located in node j. 

• Bandwidths: V/ G £, £ iGC Z M ^ BW « 

• QoS: Vi G C,Vj G Ancestors(i), dist(i, j)yij < qjj/i,j 

Altogether, we have fully characterized the linear program for the Multiple policy. 
5.3 An ILP-based lower bound 

The previous linear programs contain boolean or integer variables, because it does not make sense 
to assign half a request or to place one third of a replica on a node. However, we can still relax 
the constraints and solve the linear program assuming that all variables take rational values. The 
optimal solution of the relaxed program can be obtained in polynomial time (in theory using the 
ellipsoid method [11], in practice using standard software packages [TJ 0]), and the value of its 
objective function provides an absolute lower bound on the cost of any valid (integer) solution. 
Of course the relaxation makes the most sense for the Multiple policy, because several fractions of 
servers are assigned by the rational program. While not likely to be achievable, this lower bound 
will provide an absolute reference for the performance of the polynomial heuristics described in 
Section [6l 

6 Heuristics for the Replica Cost problem 

In this section several heuristics for the Closest, Upwards and Multiple policies are presented. 
As previously stated, our main objective is to provide an experimental assessment of the relative 
performance of the three access policies. Our first attempt targets heterogenous trees without 
QoS nor bandwidth constraints, thus considering the Replica Cost problem, but further work 
will be devoted to analyzing the impact of the additional constraints (and in particular of the QoS 
constraints) on the replica costs achieved by each policy. 

All the eight heuristics described below have polynomial, and even worst case quadratic com- 
plexity 0(s 2 ), where s = \C\ + \J\f\ is the problem size. Indeed, all heuristics proceed by traversing 
the tree, and the number of traversals is bounded by the number of internal nodes (and is much 
lower in practice). 

We assume that each node k G J\f U C \ {root} knows its parent(fc). Additionally, an internal 
node j G N knows its children(j), and the set clients^') of the clients in its subtree subtree(j). At 
any step of the heuristics, we denote by inreq^ the number of requests in subtree(j) reaching j with 
the current replicas already placed (initially, with no replica, inreq^ = SieciientsO') ^ e use a 
boolean variable treated^ to mark if a node j has been treated during a tree traversal. The set of 
replicas is initialized by replica = 0. 



INRIA 



Strategies for Replica Placement in Tree Networks 



25 



6.1 Closest 

The first two heuristics enforce the Closest policy through a top-down approach, whereas the third 
heuristic uses a bottom-up approach. 

Closest Top Down All (CTDA) - The basic idea is to perform a breadth-first traversal of the 
tree. Every time a node is able to process the requests of all the clients in its subtree, the node is 
chosen as a server, and we do not explore further that subtree. The procedure ClosestTopDownAll 
( CTDA ) is presented in Algorithm!!! It is called until no more servers are added in a tree traversal. 

procedure CTDA (root, replica) 
Fifo fifo; 
fifo.push(root); 
while fifo ^ do 
s = fifo.pop(); 
if s ^ replica then 

if W s > inreq s & inreq s > then 
replica = replica U {s}; 

foreach a G Ancestors(s) do inreq a = inreq a — inreq,; 
else 

foreach i G children(s) do 

if i G N then fifo.push(i); 
end 
end 
end 
end 

Algorithm 4: Procedure CTDA 



Closest Top Down Largest First (CTDLF) — The tree is traversed in breadth-first manner 
as in CTDA. However, we treat the subtree which contains the most requests first when considering 
the children of the tree (we sort the children by increasing number of requests inreq to perform 
the "fifo.push(z)"). Also, instead of adding all possible servers in a single step, the tree traversal 
is stopped as soon as a server that can process all the requests in its subtree has been found. 
This is done by adding an instruction return each time a server has been found in the procedure 
CTDA (Algorithm [J]) , just after the update of the inreq values of the server's ancestors. As for 
the previous heuristic, the procedure is called until no more server is chosen. In fact CTDLF is 
called exactly \R\ times, where R is the final set of replica. 

Closest Bottom Up (CBU) — The last heuristic for the Closest policy performs a bottom-up 
traversal of the tree. A node is chosen as a server if it can process all the requests of the clients 
in its subtree. Algorithm [5] describes a recursive implementation of ClosestBottomUp (CBU). The 
procedure is initially called with the root of the tree; while we do not reach the bottom of the tree, 
we go down. Once arrived at the bottom, i.e. when the current node s has only clients as children 
(test atBottom(s)) or when all its children have already been treated (test allChildrenTreated(s)), 
the node is marked as treated and added to the set replica if W s > inreq s . Then we go up in the 
tree until all nodes are treated, performing recursive calls. 

Each of these three heuristics is placing a number of replicas, but none is ensuring whether a 
valid solution has been found or not. We need to check the final value of inreq root . If there still 
are some pending requests at the root, there is no valid solution. However, if inreq root = 0, the 
heuristic has found a solution. 
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procedure CBU (s S A/", replica) 
if atBottom(s) || allChildrenTreated(s) then 
treated,, = true; 

if W s > inreq s & inreq s > then 

/* node can treat all children's requests */ 
replica = replica U {s}; 

foreach a G Ancestors(s) do inreq a = inreq a — inreq,; 
else 

/* node cannot treat all children's requests, go up in the tree */ 
if Ancestors(s) ^ then call CBU (parent(s), replica); 
end 
else 

foreach i G children(s) do 

/* not yet at the bottom of the tree, go down */ 
if i G 7V& ^treatedi then call CBU (i, replica); 
end 
end 

Algorithm 5: Procedure CBU 

6.2 Upwards 

We propose two heuristics for the Upwards policy, the first one using a top-down approach, the 
other considering the clients one by one, by non-increasing order of their number of requests. 

Upwards Top Down (UTD) - The top down approach works in two passes. In the first pass 
(see Algorithm [7J) , each node s G A/" whose capacity is exhausted by the number of requests in its 
subtree (W s < inreq s ) is chosen by traversing the tree in depth-first manner. When a server is 
chosen, we delete as much clients as possible in non- increasing order of their number of requests r^, 
until the server capacity is reached or no other client can be deleted. This delete procedure is 
described in Algorithm [6l If not all requests can be treated by the chosen servers, a second pass 
is started. In this UTDSecondPass-procedure (see Algorithm [8]) servers with remaining requests 
are added. Note that all these servers are non-exhausted by the remaining requests (inreq s < W s ). 
These two procedures are each called only once, with s = root as a parameter. 

Similarly to the Closest heuristics, we need to check that inreq root = at the end of UTD to 
find out whether a valid solution has been found. 

procedure deleteRequests (s G J\f, numToDelete) 
clientList = sortDecreasing(clients(s)); 
foreach i G clientList do 
if Ti < numToDelete then 

numToDelete = numToDelete - rf, 

foreach a G Ancestors(i) do inreq a = inreq a — r^; 

children(parent(i)) = children(parent(i)) \ {i}; 

if numToDelete == then return; 
end 
end 

Algorithm 6: Procedure deleteRequests 



Upwards Big Client First (UBCF) — The second heuristic for the Upwards policy works in 
a completely different way than all the other heuristics. The basic idea here is to treat all clients in 
non-increasing order of their values. For each client we identify the server with minimal current 
capacity (in the path from the client to the root) that can treat all its requests. The capacity of a 
server is decreased each time it is assigned some requests to process. If there is no valid server to 
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procedure UTDFirstPass (s G AT, replica) 
if inreq s > W 8 & inreq s > then 

replica = replica U {s}; 

treated s = true; 

deleteRequests(s, W s ); 
end 

foreach i G children(s) do 
| if i e TV then UTDFirstPass (i, replica); 
end 

Algorithm 7: Procedure UTDFirstPass 

procedure UTDSecondPass (,s G A/", replica) 
if s ^ replica & inreq s > then 

replica = replica U {s}; 

deleteRequests(s, inreq s ); 
else 

foreach i G children(s) do 

if i G N hinreqi > then UTDSecondPass (z, replica); 
end 
end 

Algorithm 8: Procedure UTDSecondPass 



assign to a given client, the heuristic has failed to find a valid solution. Please refer to Algorithm[9] 
for details. 

procedure UBCF (s G A/", replica) 
clientList = sortDecreasing(clients(s); 
foreach i G clientList do 

ValidAncests = {a G Ancestors(z)| W a > ri]; 
if ValidAncests ^ then 

a = MiriWj{j ^ ValidAncests}; 
if a ^ replica then replica = replica U {a}; 
lU a = tU a -r 4 ; 
end 

else return no solution; 
end 

Algorithm 9: Procedure UBCF 



6.3 Multiple 

We propose three heuristics for the Multiple policy. The first one uses a top-down approach, the 
second one a bottom-up approach. The last one performs a greedy bottom-up traversal of the 
tree. 



Multiple Top Down (MTD) - The top-down approach for the Multiple policy is similar to 
the top-down approach for Upwards, with one significant difference: the delete procedure. For 
Upwards, requests of a client have to be treated by a single server, and it may occur that after 
the delete procedure a server still has some capacity left to treat more requests, but all remaining 
clients have a higher amount of requests than this leftover capacity. For Multiple, requests of a 
client can be treated by multiple servers. So if at the end of the delete procedure the server still 
has some capacity, we delete this amount of requests from the client with the largest r». This 
modified delete procedure is described in Algorithm fTUl 
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procedure deleteRequestsInMTD (s G TV, numToDelete) 
clientList = sortDecreasing(clients(,s)); 
foreach i G clientList do 
if j*i < numToDelete then 

numToDelete = numToDelete - rf, 
foreach a G Ancestors(i) do inreq a = inreq a — n; 
children(parent(i)) = children(parent(i)) \ {i}; 
else 

Ti = Ti - numToDelete; 

foreach a G Ancestors(i) do inreq a = inreq a — r.;; 
return; 
end 
end 

Algorithm 10: Procedure deleteRequestsInMTD 



Multiple Bottom Up (MBU) — The first pass of this heuristic performs a bottom-up traversal 
of the tree, as in CBU. During this traversal, nodes s G Af are added to the set replica if their 
capacity is exhausted (W s < inreqj, similarly to the first pass of the MTD procedure. The delete 
procedure is identical to the MTD delete procedure (Algorithm ITO]). except that clients are deleted 
in non-decreasing order of their r, values (instead of the non-increasing order) . Intuitively, we aim 
at deleting many small clients rather than fewer demanding ones. The MBUFirstPass is described 
in Algorithm [TTJ and the MBUSecondPass, which adds extra servers if required (similarly to the 
second pass of MTD), is described in Algorithm fT2l 

procedure MBUFirstPass (s G Af, replica) 
if atBottom(s) \\ allChildrenTreated(s) then 
treated s = true; 

if W s < inreq s & inreq s > then 

/* node is exhausted by the requests of its clients */ 

replica = replica U {s}; 

deleteRequestsInMBU(s, W s ); 
else 

/* node is not exhausted, go up the tree */ 
if An cestors(s) ^ then call MBU (parent(s), replica); 
end 
else 

/* not yet at the bottom of the tree, go down */ 
foreach i G children(s) do 

| if % G Nh -^treatedi then call MBU (£, replica); 
end 
end 

Algorithm 11: Procedure MBUFirstPass 



Multiple Greedy (MG) — The last heuristic performs a greedy bottom-up assignment of 
requests, similarly to Pass 3 of the optimal algorithm for the homogeneous case (see Algorithm [3] 
in Section mj). We add a replica whenever there are some requests affected to a server. For 
heterogeneous platforms, we may often return a cost far from the optimal, but we ensure that we 
always find a solution to the problem if there exists one. 

It might be particularly interesting to use MG only for problem instances for which MBU or 
MTD fail to find a solution. 
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procedure MBUSecondPass (s 6 Af, replica) 
if s ^ replica & inreq s > then 

replica = replica U {s}; 

deleteRequestsInMBU(s, inreq j; 
else 

foreach i G children(s) do 

if i G M & inreq i > then UTDSecondPass (i, replica); 
end 
end 

Algorithm 12: Procedure MBUSecondPass 

7 Experiments: comparisons of different access policies 

We have done some experiments to assess the impact of the different access policies, and the 
performance of the polynomial heuristics described in Section [6l We obtain an absolute lower 
bound of the solution for each tree platform with a linear program similar to those of Section \E[ 
but modified so as to solve larger problems. Section I7TT1 details how we compute this lower bound. 
We outline the experimental plan in Section 17.21 Results are given and commented in Section 17.31 
In the following, we denote by s the problem size: s = \C\ + \N\. 

7.1 Obtaining a lower bound 

The linear programs exposed in Section [5] must be solved in integer values if we wish to obtain an 
exact solution to an instance of the problem. This can be done for each access policy, but due to 
the large number of variables, the problem cannot be solved for platforms of size s > 50. Thus we 
cannot use this approach for large-scale problems. 

For all practical values of the problem size, the rational linear program returns a solution in 
a few minutes. We tested up to several thousands of nodes and clients, and we always found a 
solution within ten seconds. 

However, we can obtain a more precise lower bound for trees with up to s = 400 nodes and 
clients by using a rational solution of the Multiple instance of the linear program with fewer integer 
variables. We treat the yij and z^i as rational variables, and only require the Xj to be integer 
variables. These variables are set to 1 if and only if there is a replica on the corresponding node. 
Thus, forbidding to set < Xj < 1 allows us to get a realistic value of the cost of a solution of the 
problem. For instance, a server might be used only at 50% of its capacity, thus setting x = 0.5 
would be enough to ensure that all requests are processed; but in this case, the cost of placing 
the replica at this node is halved, which is incorrect: while we can place a replica or not but it is 
impossible to place half of a replica. 

In practice, this lower bound provides a drastic improvement over the unreachable lower bound 
provided by the fully rational linear program. The good news is that we can compute the refined 
lower bound for problem sizes up to s = 400, using GLPK [I]. We used the refined bound for all 
our experiments. 

7.2 Experimental plan 

The important parameter in our tree networks is the load, i.e. the total number of requests 
compared to the total processing power: 

We have performed experiments on 30 trees for each of the nine values of A selected (A = 
0.1, 0.2, 0.9). The trees have been randomly generated, with a problem size 15 < s < 400. 
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When A is small, the tree has a light request load, while large values of A implies a heavy load on 
the servers. We then expect the problem to have a solution less frequently. 

We have computed the number of solutions for each lambda and each heuristic. The number 
of solutions obtained by the linear program indicates which problems are solvable. Of course we 
cannot expect a result with our heuristics for those intractable problems. 

To assess the relative cost of each heuristic, we have studied the distance of the result (in terms 
of replica cost) of the heuristic to the lower bound. This allows to compare the cost of the different 
heuristics, and thus to compare the different access policies. For each A, the cost is computed on 
the trees for which the linear program has a solution. Let T\ be the subset of trees with a solution. 
Then, the relative cost for the heuristic h is obtained by: 

1 ^ cost LP (t) 

rcost = —— > 7— 

|T A |^ cost h (t) 

where cost LP ^ is the lower bound cost returned by the linear program on tree t, and costh(t) is 
the cost involved by the solution proposed by heuristic h. In order to be fair versus heuristics who 
have a higher success rate, we set costh{t) = +oo if the heuristic did not find any solution. 

Experiments have been conducted both on homogeneous networks (Replica Counting prob- 
lem) and on heterogeneous ones (Replica Cost problem). 



7.3 Results 

A solution computed by a Closest or Upwards heuristic always is a solution for the Multiple 
policy, since the latter is less constrained. Therefore, we can mix results into a new heuristic for 
the Multiple policy, called MixedBest (MB), which selects for each tree the best cost returned by 
the previous eight heuristics for this particular problem instance. Since MG never fails to find a 
solution if there is one, MB will neither fail either. 

Figure [9] shows the percentage of success of each heuristic for homogeneous platforms. The 
upper curve corresponds to the result of the linear program, and to the cost of the MG and 
MB heuristics, which confirms that they always find a solution when there is one. The UBCF 
heuristic seems very efficient, since it finds a solution more often than MTD and MBU, the other 
two Multiple policies. On the contrary, UTD, which works in a similar way to MTD and MBU, 
finds less solutions than these two heuristics, since it is further constrained by the Upwards policy. 
As expected, all the Closest heuristics find fewer solutions as soon as A reaches higher values: 
the bottom curve of the plot corresponds to CTDA, CTDLF and CBU, which all find the same 
solutions. This is inherent to the limitation of the Closest policy: when the number of requests 
is high compared to the total processing power in the tree, there is little chance that a server can 
process all the requests coming from its subtree, and requests cannot traverse this server to be 
served higher in the tree. These results confirm that the new policies have a striking impact on 
the existence of a solution to the Replica Counting problem. 

Figure flOl represents the relative cost of the heuristics compared to the LP-based lower bound. 
As expected, the hierarchy between the policies is respected, i.e. Multiple is better than Upwards 
which in turn is better than Closest. For small values of A, it happens that some Closest heuristics 
give a better solution than those for Upwards or Multiple, due to the fact that the latter heuristics 
are not well optimized for small values of A. Also, UBCF is better than all the Multiple heuristics 
for A = 0.6. Altogether, the use of the MixedBest heuristic MB allows to always pick up the best 
result, thereby resulting in a very satisfying relative cost for the Multiple instance of the problem. 
The greedy MG should not be used for small values of A, but proves to be very efficient for large 
values, since it is the only heuristic to find a solution for such instances. To conclude, we point out 
that MB always achieves a relative cost of at least 85%, thus returning a replica cost within 17% of 
that of the LP-based lower bound. This is a very satisfactory result for the absolute performance 
of our heuristics. 

The heterogeneous results (see Figure Qj] and Figure fT2|) are very similar to the homogeneous 
ones, which clearly shows that our heuristics are not much sensitive to the heterogeneity of the 
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platform. Therefore, we have an efficient way to find in polynomial time a good solution to all the 
NP-hard problems stated in Section [4J 
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Figure 9: Homogeneous case - Percentage of success. 
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Figure 10: Homogeneous case - Relative cost. 
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Figure 11: Heterogeneous case - Percentage of success. 
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Figure 12: Heterogeneous case - Relative cost. 
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8 Extensions 

In this paper we have considered a simplified instance of the replica problem. In this section, 
we outline two important generalizations, namely dealing with several objects, and changing the 
objective function. 

8.1 With several objects 

In this paper, we have restricted the study of the problem to a single object, which means that 
all replicas are identical (of the same type). We can envision a system in which different types 
of objects need to be accessed. The clients are then having requests of different types, which can 

(k) 

be served only by an appropriate replica. Thus, for an object of type k, client i 6 C issues r\ 
requests for this object. To serve a request of type k, a node must be provided with a replica 
of that type. Nodes can be provided with several replica types. A given client is likely to have 
different servers for different objects. The QoS may also be object-dependent (q^). 

To refine further, new parameters can be introduce such as the size of object k and the compu- 
tation time involved for this object. Nodes parameters become object-dependent too, in particular 
the storage cost and the time required to answer a request. 

The server capacity constraint must then be a sum on all the object types, while the QoS must 
be satisfied for each object type. The link capacity also is a sum on the different object types, 
taking into account the size of each object. 

There remains to modify the objective function: we simply aim at minimizing the cost of all 
replicas of different types that have been assigned to the nodes in the solution to get the extended 
replica cost for several objects. 

Because the constraints add up linearly for different objects, it is not difficult to extend the 
linear programming formulation of Section OJ to deal with several objects. Also, the three access 
policies Closest, Upwards and Multiple could naturally be extended to handle several objects. 
However, designing efficient heuristics for various object types, especially with different com- 
munication to computation ratios and different QoS constraints for each type, is a challenging 
algorithmic problem. 

8.2 More complex objective functions 

Several important extensions of the problem consist in having a more complex objective function. 
In fact, either with on or with several objects, we have restricted so far to minimizing the cost of 
the replicas (and even their number in the homogeneous case). However, several other factors can 
be introduced in the objective function: 

Communication cost — This cost is the read cost, i.e. the communication cost required to 
access the replicas to answer requests. It is thus a sum on all objects and all clients of 
the communication time required to access the replica. If we take this criteria into account 
in the objective function, we may prefer a solution in which replicas are close to the clients. 

Update cost — The write cost is the extra cost due to an update of the replicas. An update must 
be performed when one of the clients is modifying (writing) some of the data. In this case, 
to ensure the consistency of the data, we need to propagate the modification to all other 
replicas of the modified object. Usually, this cost is directly related to the communication 
costs on the minimum spanning tree of the replica, since the replica which has been modified 
sends the information to all the other replicas. 

Linear combination — A quite general objective function can be obtained by a linear combina- 
tion of the three different costs, namely replica cost, read cost and write cost. Informally, 
such an objective function would write 




replica cost+ (3 




write cost 



servers, objects 



requests 



updates 
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where the application-dependent parameters a, and 7 would be used to give priorities to 
the different costs. 

Again, designing efficient heuristics for such general objective functions, especially in the con- 
text of heterogeneous resources, is a challenging algorithmic problem. 

9 Related work 

Early work on replica placement by Wolfson and Milo [13] has shown the impact of the write cost 
and motivated the use of a minimum spanning tree to perform updates between the replicas. In 
this work, they prove that the replica placement problem in a general graph is NP-complete, even 
without taking into account storage costs. Thus they address the case of special topologies, and 
in particular tree networks. They give a polynomial solution in a fully homogeneous case and a 
simple model with no QoS and no server capacity. Their work uses the closest server access policy 
(single server) to access the data. 

Using this Closest policy, Cidon et al [2] studied an instance of the problem with multiple 
objects. In this work, the objective function has no update cost, but integrates a communication 
cost. Communication cost in the objective function can be seen as a substitute for QoS. Thus, 
they minimize the average communication cost for all the clients rather than ensuring a given 
QoS for each client. They target fully homogeneous platforms since there are no server capacity 
constraints in their approach. A similar instance of the problem has been studied by Liu et al [9], 
adding a QoS in terms of a range limit (QoS=distance), and the objective being the Replica 
Counting problem. In this latter approach, the servers are homogeneous, and their capacity is 
bounded. 

Cidon et al [2] and Liu et al [9] both use the Closest access policy. In each case, the optimization 
problems are shown to have polynomial complexity. However, the variant with bidirectional links 
is shown NP-complete by Kalpakis et al [5]. Indeed in [5], requests can be served by any node 
in the tree, not just the nodes located in the path from the client to the root. The simple 
problem of minimizing the number of replicas with identical servers of fixed capacity, without any 
communication cost nor QoS contraints, directly reduces to the clasical bin packing problem. 

Kalpakis et al [5] show that a special instance of the problem is polynomial, when considering 
no server capacities, but with a general objective function taking into account read, write and 
storage costs. In their work, a minimum spanning tree is used to propagate the writes, as was 
done in [13]. Different methods can however be used, such as a minimum cost Steiner tree, in 
order to further optimize the write strategy [6]. 

All papers listed above consider the Closest access policy. As already stated, most problems 
are NP-complete, except for some very simplified instances. Karlsson et al [3 [7] compare different 
objective functions and several heuristics to solve these complex problems. They do not take QoS 
constraints into account, but instead integrate a communication cost in the objective function as 
was done in [2]. Integrating the communication cost into the objective function can be viewed as 
a Lagrangian relaxation of QoS constraints. 

Tang and Xu |12j have been one of the first authors to introduce actual QoS constraints in the 
problem formalization. In their approach, the QoS corresponds to the latency requirements of each 
client. Different access policies are considered. First, a replica-aware policy in a general graph is 
proven to be NP-complete. When the clients do not know where the replicas are (replica-blind 
policy), the graph is simplified to a tree (fixed routing scheme) with the Closest policy, and in this 
case again it is possible to find a polynomial algorithm using dynamic programming. 

To the best of our knowledge, there is no related work comparing different access policies, 
either on tree networks or on general graphs. Most previous works impose the Closest policy. 
The Multiple policy is enforced by Rodolakis et al [10] but in a very different context. In fact, 
they consider general graphs instead of trees, so they face the combinatorial complexity of finding 
good routing paths. Also, they assume an unlimited capacity at each node, since they can add 
numerous servers of different kinds on a single node. Finally, they include some QoS constraints 
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in their problem formulation, based on the round trip time (in the graph) required to serve the 
client requests. In such a context, this (very particular) instance of the Multiple problem is shown 
to be NP-hard. 

10 Conclusion 

In this paper, we have introduced and extensively analyzed two important new policies for the 
replica placement problem. The Upwards and Multiple policies are natural variants of the standard 
Closest approach, and it may seem surprising that they have not already been considered in the 
published literature. 

On the theoretical side, we have fully assessed the complexity of the Closest, Upwards and 
Multiple policies, both for homogeneous and heterogeneous platforms. The polynomial complexity 
of the Multiple policy in the homogeneous case is quite unexpected, and we have provided an 
elegant algorithm to compute the optimal cost for this policy. Not surprisingly, all three policies 
turn out to be NP-complete for heterogeneous nodes, which provides yet another example of the 
additional difficulties induced by resource heterogeneity. 

On the practical side, we have designed several heuristics for the Closest, Upwards and Multiple 
policies, and we have compared their performance for a simple instance of the problem, without 
QoS constraints nor bandwidth limitations. In the experiments, the constraints were only related 
to server capacities, and the total cost was the sum of the server capacities (or their number in 
the homogeneous case). Even in this simple setting, the impact of the new policies is impressive: 
the number of trees which admit a solution is much higher with the Upwards and Multiple policies 
than with the Closest policy. Finally, we point out that the absolute performance of the heuristics 
is quite good, since their cost is close to the lower bound based upon the solution of the integer 
linear program. 

There remains much work to extend the results of this paper, in several important directions. 
In the short term, we need to conduct more simulations for the Replica Cost problem, varying 
the shape of the trees, the distribution law of the requests and the degree of heterogeneity of the 
platforms. We also aim at designing efficient heuristics for more general instances of the Replica 
Placement problem, taking QoS and bandwidth constraints into account. It will be instructive 
to see whether the superiority of the new Upwards and Multiple policies over Closest remains so 
important in the presence of QoS constraints. Also, including bandwidth constraints may require 
a better global load-balancing along the tree, thereby favoring Multiple over Upwards. 

In the longer term, designing efficient heuristics for the problem with various object types, all 
with different communication to computation ratios and different QoS constraints is a demanding 
algorithmic problem. Also, we would like to extend this work so as to handle more complex 
objective functions, including communication costs and update costs as well as replica costs; this 
seems to be a very difficult challenge to tackle, especially in the context of heterogeneous resources. 
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